Optimization and prediction of the cotton fabric dyeing process using Taguchi design-integrated machine learning approach

The typical textile dyeing process calls for a wide range of operational parameters, and it has always been difficult to pinpoint which of these qualities is the most important in dyeing performance. Consequently, this research used a combined design of experiments and machine learning prediction models’ method to offer a sustainable and beneficial reactive cotton fabric dyeing process. To be more precise, we built a least square support vector regression (LSSVR) model based on Taguchi's statistical orthogonal design (L27) to predict exhaustion percentage (E%), fixation rate (F%), and total fixation efficiency (T%) and color strength (K/S) in the reactive cotton dyeing process. The model's prediction accuracy was assessed using many measures, including root mean square error (RMSE), mean absolute error (MAE), and the coefficient of determination (R2). Principal component regression (PCR), partial least square regression (PLSR), and fuzzy modelling were some of the other types of regression models used to compare results. Our findings reveal that the LSSVR model greatly outperformed competing models in predicting the E%, F%, T%, and K/S. This is shown by the LSSVR model's much smaller RMSE and MAE values. Overall, it provided the highest possible R2 values, which reached 0.9819.

required. Furthermore, this method may provide inaccurate outcomes, such as the belief that interaction effects cannot be observed, making it difficult to ascertain process parameters' performance 12,13 . Therefore, process optimization by systematically dyeing cotton fabric in the presence of reactive dyes is a relevant factor in commercial dyeing operations.
With this in mind, a computational tool that can optimize conditions for achieving the desired color strength of the reactive dyed cotton fabric at the lowest production cost may significantly contribute to textile dye assembly, reducing chemical inputs, process time, and economic expenses. In this scenario, the design of experiments, commonly referred to as DOE, is a technique that uses a systematic approach to establish a connection between the factors that affect the outcome of a process and the factors that affect the outcome of the process. The DOE methodology may generally be classified into two distinct categories: the complete factorial design and the Taguchi experimental design. The entire factorial design process involves evaluating and analyzing every potential combination of parameter values. When conducting research using a Taguchi experimental design, one only considers selected levels for assessment. As a result of the use of an orthogonal array (OA) architecture, the Taguchi method is regarded as a reliable approach. The OA can quantitatively determine the appropriate parameters and levels, and it is used to reduce the number of trials, the time of the experiments, the cost, and the quantity of necessary human energy [14][15][16] . Wahyudin et al. 17 employed an L9 orthogonal array design using ANOVA analysis as the primary statistical technique to improve the cotton knit fabric dying process, This research confirms that using Taguchi is critical to shortening the re-dyeing stage. According to the research of 2 a natural dye may be extracted using an L25 Taguchi methodology under ideal circumstances, and this method can then be used to color cotton fabrics. Hossain et al. 18 adopted a Taguchi design based on an orthogonal array of L9 designs to conduct the deep dyeing of cotton fabric using cacao husk extract to optimize the exhaustion percentage. While this Taguchi model is useful for analyzing data within narrow ranges, it cannot provide reliable predictions for process parameters beyond those limits.
In accelerating the performance of statistical optimization techniques, machine learning (ML) is being used in various fields to speed up these operations and minimize the time and price of simulation. Least square support vector regression models (LSSVRs) are an example of a machine learning model that may get insights directly from data and comprehend how they operate with a wide range of inputs. Most notably, the LSSVR model shows excellent predictive power in determining the expected value of the target output variable in the context of nonlinear data. The LSSVR model is a nonlinear prediction method based on support vector machine theory (SVM). When it comes to decreasing the computational burden associated with reducing the number of viable classes, LSSVR delivers a more efficient response than the SVM by applying a separate set of linear equations in dual space 19,20 . In recent years, academics from a broad range of disciplines have devoted increasing amounts of attention and interest throughout the course of many years. As such, the revolution of the textile industry 4.0 concept is strongly in line with the adaptation of machine learning so as to maintain sustainability and competence in the market 21 . With the inclusion of ML, textile industries' production, especially dyeing sections, can greatly benefit from human interaction with the interface of intelligent computer-directed tools to enhance production effectiveness. Previous studies have emphasized the importance of ML in the textile industry; for example, Ribeiro et al. 22 applied automated machine learning in the textile design and finishing features to predict the physical properties of cotton woven fabric. Tsai 23 underlines sustainable production planning and control strategies using machine learning programming to develop the textile industry 4.0 approach. Similarly, He et al. 24 used a machine learning-based decision support system for controlling the textile manufacturing process. However, the application of machine learning for the textile dyeing process is rarely reported, which deserves further investigation.
The Taguchi method and machine learning are two different approaches used in various fields. This study uses Taguchi Method to design experiments focusing on parameter optimization and robust design. Even Taguchi based on a table can predict other data, but the Taguchi model is useful for analyzing data within narrow ranges, it cannot provide reliable predictions for process parameters beyond those limits. Hence, this study uses the machine learning model based on Taguchi's statistical orthogonal design to make the prediction. Given it, the present study combined the Taguchi orthogonal model with the LSSVR model for predicting the cotton fabric coloration properties via an industrial dyeing procedure for the first time in the literature. After that, the accuracy of the LSSVR is figured out by calculating the coefficient of determination, also known as R 2 , the root mean square error (RMSE), and the absolute mean error (MAE). In addition, the results are compared to those generated by employing models of the fuzzy model, principal component regression (PCR), and partial least square regression (PLSR) to gain a deeper comprehension of the predictive qualities these models possess.

Experimental
Materials and reagents. This study was conducted with 100% pure bleached cotton plain-weave fabric (140 g m −2 , yarn count 40s Ne) obtained from Jiangnan Group Co., Ltd., China. Commercial dye C. I. Reactive Blue 194 (Table S1) was bought from Shanghai Jiaying Chemical Company, China. The nonionic detergent known as Luton 500 was acquired from the Dalton UK Company. The electrolyte (NaCl, 99.5%) and the alkali (Na 2 CO 3 , 99.8%) arrived from Sinopharm Chemical Reagent Co., Ltd., in Shanghai, China, 200002. Besides these chemicals, all other reagents and chemicals used in the experiment were typically laboratory-quality items.
Computer-aided dyeing process. Taguchi-based optimization approach. Before beginning the Taguchi analysis, we established the response functions and the independent, manipulated variables. After that, the number of levels for the independent variables was decided. Minitab®17 statistical software was used to develop Taguchi's orthogonal array-based design of experiments (OA L 27 ) (Minitab Inc, Coventry, UK). Tables S2 and S3 provide a detailed overview of the experimental design used to conduct the 27 separate tests across 6 different variables and 3 different factor levels 15 (Fig. 1). The dyebath was prepared by adding sodium chloride at 20 °C, heating, and sodium carbonate. The soaping procedure washed away unfixed dyes. Nonionic detergent (Luton 500, Dalton UK Co.), 2 g L −1 , was used in the rotary infrared sample dyeing machine's soap washing procedure at a material-to-liquor ratio of 1:10 and a temperature of 95 °C for 15 min (Fig. S1). After being washed with soap, the coloured samples were dried in an oven at 80 °C for 30 min.
Machine learning approach. A total of 27 datasets were adopted from the dyeing process parameters of cotton fabric. These parameters included the dye dosage, dye-fixing temperature, dye-fixing time, dyebath pH, material-to-liquor ratio, salt concentration, and their responses, which included exhaustion percentage (E%), fixation rate (F%), total fixation efficiency (T%), and color strength (K/S). Once the data was uploaded into MATLAB, it was split 80/20 between the training and testing sets (Table S3). Figure 2 shows the general layout of the four different regression models (PCR, PLSR, Fuzzy approach, and LSSVR) used to predict E%, F%, T%, and K/S values. Models like PCR, PLSR, the fuzzy method, and LSSVR are developed using the training data, as shown in Fig. S2. The models' progress was evaluated using the same models used during training and testing; they included the PCR, PLSR, Fuzzy method, and LSSVR models. The root-mean-square error (RMSE), mean absolute error (MAE), and correlation coefficient (R 2 ) values of each available regression model were calculated and compared. A rundown of the many parameter configurations for the PCR, PLSR, Fuzzy approach, and LSSVR models is provided in Table 1. The notations N T , N 1 , and N 2 refer to the total number of datasets used for training, the number of datasets used for testing, and the number of latent variables. Meanwhile, the tuning parameters used in the LSSVR model are represented by the symbols γ, λ, and p. These symbols are referred to as  www.nature.com/scientificreports/ the LSSVR model parameters. The datasets for this cotton fabric dyeing process are available in the supplementary file. Moreover, even though the available models were used, the dataset is still required to train the available models to develop the models, particularly for this cotton fabric dyeing process.
Analysis of prediction behaviour. The root-mean-square error, often known as RMSE, is a scale-dependent defect metric that significantly determines whether a prediction model is effective 25 . Because it allowed comparisons across alternative configurations for a single variable, this metric was used to determine whether or not the data splitting ratio was appropriate. To put it another way, root-mean-square error (RMSE) is a statistic that assesses how far a model strays from the correct answers, with a lower value suggesting that a prediction has a higher degree of accuracy. The RMSE is calculated by taking the root square of the total squared differences between the actual output and the projected output 25 . Alternatively, it might be seen as an indicator of the disparities between the expected values and those observed. When this is considered, a lower RMSE suggests greater accuracy and ability to forecast outcomes. The RMSE formula is shown here as Eq.
The Y i shown here signifies the actual output, whereas the Ŷ i denotes the predicted output, and n is the total number of samples taken.
The MAE, as shown in Eq. (2), is a statistic that considers neither the directionality nor the severity of errors when evaluating a set of predictions. This value is the weighted mean of the absolute differences between the expected and actual observations for all the test data set observations.
where represents the summation.
The coefficient of determination, or R 2 , measures how well a regression model accounts for a certain dataset's variation in the target variable 25 . R 2 is a statistical measure of the "goodness of fit" between a regression model's observed and expected values. Its worth may range from zero to one 27 . If it's near one, the inputs selected should produce the desired output; if it's farther away, the fit might need some work. The coefficient of determination (R 2 ) is found by comparing the sum of squared errors to the sum of squared deviations from the mean of the variable in question. The level of similarity between observed and predicted data is quantified by a statistic called R 2 . Full details of the formula are given in Eq. (3) 28 .
More specifically, Eq. (4) 29 provides a mathematical description of the prediction error (PE) that is put to use. We investigate the error of approximation, indicated by Ea, and compute Ea using Eq. (5) 30 to provide a quantitative demonstration of the predictive abilities of E%, F%, T%, and K/S values.
where V 1 and V 2 represent the target and actual values, respectively. In training and testing datasets, RMSE 1 and RMSE 2 represent the RMSE, while MAE 1 and MAE 2 represent the MAE.

Measurements and characterization.
With the use of a UV-visible spectrophotometer (TU-1900 UV-Vis, PERSEE, China), the light absorbance values of the dyebath solution before and after dyeing, as well as the leftover soaping solution, were measured and recorded at the wavelength of maximum absorption, which was 560 nm. Using Eqs. (6)-(8) 18 , we determined the dye exhaustion percentage (E%), fixation rate (F%), and total fixation efficiency (T%). www.nature.com/scientificreports/ where the light absorbances of the dyebath before dyeing, after dyeing, and the remaining soaping solution is denoted by Ao, At, and As, respectively. A reflectance spectrophotometer was used to evaluate the color strength, denoted by the K/S ratio, of dyed fabric samples taken randomly from twenty different locations (CHN-Spec CS-650A, Hangzhou Color Spectrum Technology Company, China). The value of the color strength was determined by measuring it at the wavelength at which the dye absorbed the lightest, and the value of the color strength as an average has been presented here.
The surface morphology of the undyed and dyed fabrics was analyzed using Phillips's scanning electron microscope instrument (FE-SEM, Germany). The sample preparation for scanning electron microscopy was carried out as usual. The samples were individually fastened on a standard sample holder after the fabrics were trimmed to sizes no bigger than 1 cm 2 each. 10 kV was the acceleration voltage, while the working distance was between 15 and 17 mm. A coating of gold was sputtered onto the surfaces of the samples before they were analyzed using SEM. A Nicolet iS5 FT-IR spectroscopy (Thermo Fisher Scientific, USA) was used in order to perform an FTIR study on the undyed and dyed fabrics. In order to investigate the crystallinity behaviour, the X-ray diffraction patterns of the undyed and RB 194-dyed cotton fabric were measured using a Rigaku Ultima III X-ray diffractometer (Tokyo, Japan). The materials were meticulously scissored down to a powdery consistency in preparation for the examination. After that, the powder was placed in a laboratory hydraulic pressing machine and subjected to a static pressure of 127 MPa for sixty seconds to create a circular disc. The sample disc was then put on the holder of the equipment, and the diffraction pattern was recorded with a 2θ angle range from 10 to 80 o , having a step size of 0.02° under CuKα radiation (λ = 1.54056 Å).

Results and discussion
Dyeing process optimization. Process optimization is important in controlling the process to obtain the targeted results. In general, textile dyeing is a complex process equipped with a manufacturing plant functionoriented and requires various steps to maintain the operation performance. According to the Taguchi design (L 27 ) (Table S3) in Minitab software, an experiment was conducted to ensure the repetitiveness of the dyeing process. Based on the S/N ratio analysis, optimized parameters for each response, i.e., E% ( Characterization. The characterization details were measured for the undyed and optimized samples to gain better insights into the dyeing process. The scanning electronic microscope (SEM) was used to investigate the surface morphology of undyed and optimized RB 194-dyed cotton fabrics. The undyed cotton fabric had a seemingly smooth surface patterned with distinctive grooves unique to cotton fibers (Fig. 3a). As a comparison, an SEM image of cotton fabric dyed with RB 194 under optimized conditions showed a similar structure (Fig. 3b), with a smoother surface than the cotton fabric dyed with the undyed sample. This suggests that the dyeing process did not create any discernible alterations in the cotton fabric 31 .
The variety of functional groups in both the undyed and conditionally optimized RB 194-dyed sample was analyzed using Fourier transform infrared spectroscopy. For the undyed sample (Fig. 4a), a dominating peak at 3423 cm −1 may indicate an O-H stretching vibration. Subsequently, the asymmetric stretching vibration of CH 2 groups may account for a peak of about 2920 cm −1 . The C=C stretching vibration of aromatic groups corresponds to the absorption peak found at 1652 cm −132 . The possibility exists that the C-C stretching vibration and the C-H bend stretching vibration are both present in the examined sample since there is a peak at 1458 cm −133 . The existence of a C-O stretching vibration can be indicated by the peak at 1058 cm −134 . Considering the optimized conditioned RB 194-dyed sample, the primary characteristic peaks were still intact, and the dyeing procedure did not develop any new peaks, suggesting that the aqueous reactive dyeing medium was appropriate for cotton fabric.  35 . The XRD patterns show that the dyeing procedure did not affect the crystallinity of the cotton. During dyeing, the cellulosic fibers were swollen well, which swelling only occurred in the amorphous zone. Thus, the reactive blue 194 dyes were adsorbed in the amorphous area and the surface of the crystallinity zone. Subsequently, the dyes were fixed with the free hydroxyl groups of cellulosic fiber under the fixation conditions. In order to promote the dyeability of cotton fiber, caustic mercerlization and liquid ammonia treatment 7 were widely applied because both treatments can damage the crystalline structure of the cotton fiber, i.e., an increase of the amorphous zone, thereby increasing the dye adsorption and fixation rate 36 .
The Reactive Blue 194 was adsorbed in cotton fiber and then fixed in cotton fiber via a covalent bond between the reactive group of dye and the hydroxyl group of cellulose after the addition of alkali at the target temperature. The dye fixation mechanism is shown in Fig. 5. Reactive Blue 194 (component a in Fig. 5) is a bifunctional reactive dye with one monochlorotriazinyl group and one vinyl sulphone sulfate; the latter is more reactive than the former 37 . After the addition of alkali, many hydroxyl groups of cellulose (component b in Fig. 5) were changed to cellulosate group (cellulose-O − ), which is more nucleophilic; meanwhile, the reactive groups became excited. During the dye fixation, the vinyl sulphone sulfate was transferred to the vinyl sulphone group and then reacted with cellulosate to form a covalent bond (component c in Fig. 5). The monochlorotriazinyl group also covalently  www.nature.com/scientificreports/ bonded with cellulosate (component d in Fig. 5). Also, both reactive groups may form covalent bonds with cellulosate (component e in Fig. 5). These covalent bonds contributed to the high performance of the colorfastness to washing dyed cotton fabric 38 .

Principal component analysis for feature selection. Principal component analysis (PCA) is a dimen-
sional deduction method for unsupervised feature selection based on eigenvectors analysis computing principal components to identify critical original features 39 . In other words, PCA is a method for feature selection that selects variables according to the magnitude (from biggest to smallest in absolute values) of their PCA coefficients or loadings. These coefficients or loadings are the covariances or correlations between the original variables. The larger values denote that a variable has a stronger effect on that principal component and is more important to the corresponding variable 40 . Based on the results from PCA, the variances for the input variables involving the dye dosage, dye-fixing temperature, dye-fixing time, dyebath pH, material-to-liquor ratio, and salt concentration are 276.9231, 69.2308, 69.2308, 2.7692, 0.6923, and 0.0017, respectively. Therefore, the first four input variables, dye dosage, dye-fixing temperature, dye-fixing time, and dyebath pH, are important and capture 99.83% of the variation. Figure 6 depicts the PCA biplot illustrating how strongly each characteristic influences a principal component. This PCA biplot explores and visualizes the relationship and correlation between the variables 41 . From Fig. 6, it can be seen obviously that the scales of the first four variables, including the dye dosage, dye-fixing temperature, dye-fixing time, dyebath pH, material-to-liquor ratio, and salt concentration, appear relatively higher (have a long distance from the origin). These results are identical to the calculated variances from that PCA in which they indicate that these variables have a significant portion of the variance in the data. Moreover, the principal component scores and loadings for the first two principal components are illustrated in Fig. 6. The axes show the principal component scores, and the vectors are the loading vectors that represent the loading pair per the original variable. It can simply be said that most variables, including the material-to-liquor ratio, pH, temperature, salt concentration, colour strength, and exhaustion percentage, are in the same direction as PC1; hence, they positively correlate with PC1. Meanwhile, the time and dye dosage have the same direction as PC2, indicating they are positively correlated with PC2.
Modelling assessment. Estimates of the dyed cotton fabric's qualities, such as E%, F%, T%, and K/S values, may be calculated using predictive modelling tools, such as the LSSVR model, combined with the dyeing process parameters. In the past, there have only been a few studies that focused on modelling the dying process within the context of a statistical design 18,42,43 . Nevertheless, these predictions can only be made using this method within the bounds of the data that has been supplied. Therefore, in this research, various machine learning models, such as LSVVR, Fuzzy approach, PLSR, and PCR, were developed to overcome the drawbacks of this mathematical modelling methodology by using the characteristics associated with the dyeing process.
The results of the previously described machine-learning models for E%, F%, T%, and K/S, respectively, are summarized in Tables 2, 3, 4, and 5 for convenience. In this work, a model's performance was quantified using three widely used error metrics, including RMSE, MAE, and R2, to exclude any possibility of bias in the evaluation of the model's performance 44 . Tables 2, 3, 4, and 5 show that the training data's error metrics are RMSE 1 , MAE 1 , R 1 2 , whereas the testing data's error metrics are RMSE 1 , MAE 1 , and R 2 2 . According to Table 2, the LSSVR model performed the best compared to the other models, even though its R 2 is slightly lower than the fuzzy method for E% in its predictions. The PCR and PLSR models came in second www.nature.com/scientificreports/ www.nature.com/scientificreports/ and third, respectively, particularly for those models' training data applications. Compared to the other models, the LSSVR model's RMSE and MAE values are 91% to 107% lower than those of the other models (as shown in Table 2). Consequently, the LSSVR model continues to provide the best overall results. The LSSVR model has R 2 values that are, except the R 1 2 value in Table 2, − 650% better than those of the Fuzzy technique, the PCR model, and the PLSR model.
In addition, Table 3 shows that the F% predictions made by the LSSVR model were the most accurate, followed by those made by the Fuzzy technique, the PLSR model, and the PCR model. In Table 3, the LSSVR model has lower RMSE and MAE values (by 19% to 425%) and higher R 2 (by − 1% to 842%) than the Fuzzy approach, PCR, and PLSR models. Further, Table 4 shows that the LSSVR model, followed by the Fuzzy technique, the PLSR model, and the PCR model, has the best T% prediction accuracy. Table 4 demonstrates that compared to the Fuzzy method, PCR, and PLSR models, the LSSVR model has lower RMSE and MAE values by 13% to 353% and a higher R 2 by 0% to − 411%. In a similar vein, the LSSVR model fared better than the other models for K/S shown in Table 5 since it had lower RMSE and MAE values (by 65% to 147%, respectively) and a higher R 2 (by − 4% to − 781%, respectively) than the Fuzzy approach, PCR, and PLSR models.
The LSSVR model outperformed the other models in Tables 2, 3, 4 and 5 because it includes an extra modelthe leave-one-out cross-validation model-that finds the optimal tuning parameters for prediction 29 . Furthermore, the nonlinear experimental data is mapped onto a higher dimensional space that may produce a better forecast thanks to the LSSVR model's usage of the renowned kernel function, the radial basis function. Also, besides the LSSVR model, the rest of the models have very low R 2 values for both training and testing datasets, indicating that these models performed poorly for this cotton fabric dyeing process. The R 2 values for the LSSVR model are between 0.6606 and 0.9819, which are higher than the benchmarked or acceptable R 2 value reported by Ozili 45 and Veerasamy et al. 46 .
Tables 2, 3, 4 and 5 show that the Fuzzy approach implemented in MATLAB's fuzzy logic designer program yields superior results when used on training data. However, the fuzzy technique had the worst results for testing data, with R 2 2 for E%, F%, T%, and K/S all being negative. This indicates the fuzzy method has a very poor prediction ability. According to Satrio et al. 47 , negative R 2 values indicate a large gap between the observed and predicted values; in this case, the fuzzy technique in this investigation yielded extremely dissimilar results. This is because the training data creates rules and criteria that form the basis of the fuzzy method's membership function, fuzzy logic operators, and if-then statements 19 . An assortment of fuzzy rules, a database detailing the membership functions used by the fuzzy rules, and a reasoning mechanism outlining the inference path upon the rules to adopt projected data are the three parts of this framework 48 . As a consequence, the fuzzy technique fails to provide satisfactory results if the testing data differs from the training data. Tables 2, 3, 4 and 5 show that the findings from the PCR and PLSR models are quite comparable, although the PLSR model incorporates both the input and output variables. In contrast, the PCR model only uses the input variables 29,49 . The similarity in their findings might be attributed to the assumption that the input factors are equally crucial to the prediction accuracy 27 . Both the PCR and the PLSR models were superior to the fuzzy technique when dealing with the testing data, although producing comparable outcomes. This is because PCR and PLSR models are dimension deduction techniques, which means they utilize the training data to fit one or more response variables (predictability variables) 50 . From Tables 2, 3, 4 and 5, notice that there is no condition where the model performed well on the training data but did not perform well on the testing data 51 . Hence, it can be assured that no model is overfitted.
Moreover, from Tables 2, 3, 4 and 5, the Fuzzy method, PLSR, and PCR gave negative R 2 values, which indicate negative correlations, where the values of one variable tend to increase when the values of the other variable decrease 52 . Also, these negative R 2 values mean that the Fuzzy method, PLSR, and PCR performed poorly and are unsuitable for the cotton fabric dyeing process.
Predictivity assessment. In addition to assessing the model's performance using three error metrics, Fig. 7a-h with error bars for the actual data compare the prediction results for E%, F%, T%, and K/S from the PCR, PLSR, Fuzzy approach, and LSSVR models on the training and testing data. Figure 7a through f show that, in comparison to the fuzzy technique, the PLSR model, and the PCR model, the LSSVR model's projected data is more similar to the actual data. Predictive power increases as the degree of similarity between predicted and observed data decrease 53 . Moreover, from these Fig. 7a-h, it can also be seen that the predicted data from the LSSVR, especially for training data in Fig. 7a,c,e,g are mostly within or close to the error bars. However, the predicted data from the fuzzy technique, the PLSR, and the PCR are outside or far from the error bars. These figures can also conclude that the LSSVR model's superiority in forecasting E%, F%, T%, and K/S is validated beyond a reasonable doubt.
Accuracy of the models. The model's correctness in terms of coefficients was also assessed using the same datasets (R 2 ). Using both training and test data, the LSSVR model predicts values for E%, F%, T%, and K/S, and these values are compared to the actual values in Fig. 8a-h. The higher the model's predictive ability, the closer the point is to line 54 . Each data point in Fig. 8a through h is rather near the line. This indicates that LSSVR's anticipated outputs are consistent with the tested data for all responses. Good concordance between the model's predicted values and the actual experimental findings is indicative of the model's validity 55 . Important characteristics for optimizing the dyeing performance on cotton fabric, such as E%, F%, T%, and K/S, may be predicted using the LSSVR model.   Figure 7. Comparison of predictive results from PCR, PLSR, Fuzzy method, and LSSVR models (a) prediction of E% using training data, (b) prediction of E% using testing data, (c) prediction of F% using training data, (d) prediction of F% using testing data, (e) prediction of T% using training data, and (f) prediction of T% using testing data, (g) prediction of K/S using training data, (d) prediction of K/S using testing data.  8. Correlation between the actual and predicted values from the LSSVR model (a) E% using training data, (b) E% using testing data, (c) F% using training data, (d) F% using testing data, (e) T% using training data, (f) T% using testing data. (g) K/S using training data, (h) K/S using testing data. www.nature.com/scientificreports/ conditions are just some of the many advantages that manufacturing companies may reap from adopting Industry 4.0. Due to these promising implications, it has attracted much interest from academics and professionals 56 . Emerging in the last several years, both Industry 4.0 and sustainability1 are technical and organizational advancements that are affected by or contribute to increased productivity and environmentally responsible manufacturing. Global competitiveness, fluctuating markets, demand for increasing customization through communication, information, and intelligence, and shrinking innovation and product life cycles are some of the modern difficulties that Industry 4.0 technologies aim to address. Significant opportunities and challenges for sustainable organizational and societal growth are associated with using Industry 4.0 technology. In terms of the bottom line, the advantages of flexible manufacturing include quicker set-up and delivery times, lower labor and material costs, more production flexibility, better productivity, and improved customization 57 .
From a green perspective, energy and resource consumption may be lowered using Industry 4.0 technology for detection and data analysis throughout manufacturing and supply chain operations. Carbon footprint studies that are data-driven and auditable may help reduce waste and greenhouse gas emissions. Disassembling the individual parts of a product is possible by reusing, recycling, or remanufacturing it 58 .
To address the current trend of research, this study employs the Taguchi optimization technique to improve the precision of the dyeing process, in tandem with machine learning to consider the potential constraints of production and impact on attaining maximum security toward sustainable textile industry 4.0 development (Fig. 9).

Conclusions
The current study developed a novel combined approach LSSVR model, entitled the Taguchi-integrated LSSVR model, for predicting the exhaustion percentage (E%), fixation rate (F%), and total fixation efficiency (T%), and color strength (K/S) of reactive dyed cotton fabric. The experimental results from the dyeing process were used to inform the development of this integrated model, which takes the dye dosage, dye-fixing temperature, dye-fixing time, dyebath pH, material-to-liquor ratio, and salt concentration as input. Finding the optimal circumstances for the dyeing method is crucial for optimizing the cotton fabric's E%, F%, T%, and K/S. Predictive modeling techniques, such as LSSVR, play an important role in the procedures required to produce reactive dyed cotton textiles of the appropriate quality. Compared to the Fuzzy method, PCR, and PLSR models, the LSSVR model performed significantly better in predicting the E%, F%, T%, and K/S, as evidenced by its RMSE and MAE values being 91%-107%, 19% to 425%, 13% to 353% and 65% to 147% lower, respectively, while its R 2 values were − 650%, − 1% to 842%, 0% to − 411%, and − 4% to − 781% higher, respectively. Results suggest that the LSSVR model might be used as a prediction tool in the textile industry during the dyeing process stage. It was found that the LSSVR model has not been utilized for the cotton fabric dyeing process. Hence, the LSSVR model was introduced in this study. The results of this study show that LSSVR is suitable for the cotton fabric dyeing process, and it is recommended for other similar cases. More study is needed to determine whether or not incorporating another algorithm into the LSSVR model would enhance its prediction ability. It is also important to pay more attention to the new features and expenses connected with improving the dyeing process for cotton textiles as a prospect of the sustainable textile industry 4.0 framework.

Data availability
The datasets generated during the current study are available from the corresponding author on reasonable request (Prof. Lina Lin, L. Lin), while the data set is uploaded to https:// github. com/ pzczxs/ MLSSVR.